-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(low-code cdk): add flatten fields #181
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThis pull request introduces a new Changes
Possibly related PRs
Suggested reviewers
Hey there! 👋 Would you be interested in a quick review of this new 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
718-721
: LGTM! Would you consider adding docstring documentation?The
FlattenFields
class implementation looks good and follows the established patterns. Consider adding a docstring to describe its purpose and behavior, wdyt?class FlattenFields(BaseModel): + """A transformation that flattens nested records into a single-level format. + + This transformation is useful for sources that require simplified record structures + on their path to manifest. + """ type: Literal["FlattenFields"] parameters: Optional[Dict[str, Any]] = Field(None, alias="$parameters")unit_tests/sources/declarative/transformations/test_flatten_fields.py (2)
12-50
: Consider adding more test cases for edge scenariosThe current test cases cover the basic functionality well, but would you consider adding tests for:
- Empty records
{}
- Records with null values
- Deep nesting (>2 levels)
- Records with special characters in keys
- Performance test with a large nested record
This would help ensure the transformation handles edge cases gracefully. wdyt?
@pytest.mark.parametrize( "input_record, expected_output", [ # Empty record ({}, {}), # Null values ({"nested": {"key": None}}, {"key": None}), # Deep nesting ( {"l1": {"l2": {"l3": {"value": "deep"}}}}, {"value": "deep"} ), # Special characters ({"@special": {"$key": "value"}}, {"$key": "value"}), # Add to existing cases... ] )
51-54
: Consider enhancing test implementation robustnessThe current implementation is clean, but would you consider:
- Verifying that only the expected fields are modified by making a deep copy of the input and comparing non-flattened fields
- Adding error handling tests for invalid inputs
- Adding a test helper to verify the flattening depth
Here's a possible implementation, wdyt?
from copy import deepcopy def test_flatten_fields(input_record, expected_output): original_record = deepcopy(input_record) flattener = FlattenFields() flattener.transform(input_record) assert input_record == expected_output # Verify only expected fields were modified for k, v in original_record.items(): if not isinstance(v, (dict, list)): assert input_record[k] == v @pytest.mark.parametrize("invalid_input", [None, 42, "not_a_dict"]) def test_flatten_fields_error_handling(invalid_input): flattener = FlattenFields() with pytest.raises(ValueError): flattener.transform(invalid_input)airbyte_cdk/sources/declarative/transformations/flatten_fields.py (1)
25-50
: Consider optimizing the flattening algorithmThe current implementation is clean but could be optimized. Would you consider:
- Adding early returns for non-nested records
- Using a more memory-efficient approach for large records
- Adding depth limit to prevent stack overflow
- Handling circular references
Here's a possible optimization, wdyt?
def flatten_record(self, record: Dict[str, Any], max_depth: int = 100) -> Dict[str, Any]: # Early return if no nested structures if not any(isinstance(v, (dict, list)) for v in record.values()): return record.copy() transformed_record = {} stack = [(record, "_", 0)] # (value, parent_key, depth) seen = set() # For circular reference detection while stack: current_record, parent_key, depth = stack.pop() if depth > max_depth: raise ValueError(f"Maximum nesting depth of {max_depth} exceeded") if isinstance(current_record, dict): record_id = id(current_record) if record_id in seen: raise ValueError("Circular reference detected") seen.add(record_id) for current_key, value in current_record.items(): new_key = ( f"{parent_key}.{current_key}" if parent_key != "_" else current_key ) if isinstance(value, (dict, list)): stack.append((value, new_key, depth + 1)) else: transformed_record[new_key] = value elif isinstance(current_record, list): for i, item in enumerate(current_record): stack.append((item, f"{parent_key}.{i}", depth + 1)) return transformed_recordairbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
597-601
: Consider aligning with other transformation implementationsThe implementation is simple but could be more consistent with other transformations in the factory. Would you consider:
- Adding parameter validation similar to other transformations
- Adding type hints for the return value
- Following the pattern used by similar transformations like
create_add_fields
Here's a suggestion, wdyt?
def create_flatten_fields( self, model: FlattenFieldsModel, config: Config, **kwargs: Any ) -> FlattenFields: """Creates a FlattenFields transformation component. Args: model: The model containing the component configuration. config: The connector configuration. **kwargs: Additional keyword arguments. Returns: FlattenFields: The created component. """ return FlattenFields( parameters=model.parameters or {}, )airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1855-1867
: Consider adding configuration options for customizing the flattening behavior?The schema definition for
FlattenFields
is quite minimal. Would it be helpful to add configuration options to customize the flattening behavior? For example:
separator
- to specify the character used to join nested keysmax_depth
- to limit how deep the flattening goesarray_handling
- to specify how to handle arrays (e.g., 'expand', 'first_element', 'ignore')This would give users more control over the transformation, wdyt?
Also, would you consider adding some examples to help users understand the transformation? Something like:
examples: - input: {"user": {"name": {"first": "John", "last": "Doe"}}} output: {"user_name_first": "John", "user_name_last": "Doe"} - input: {"items": [{"id": 1}, {"id": 2}]} output: {"items_0_id": 1, "items_1_id": 2}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(1 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(4 hunks)airbyte_cdk/sources/declarative/transformations/flatten_fields.py
(1 hunks)unit_tests/sources/declarative/transformations/test_flatten_fields.py
(1 hunks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APPROVED, just left minor comment.
What
Some sources on the road to manifest only require the transformation of their records.
How
Added a
FlattenFields
transformation to update records to a single-level format.Example:
{"FirstName": "John", "LastName": "Doe"}
->{"FirstName": "John", "LastName": "Doe"}
{"123Number": 123, "456Another123": 456}
->{"123Number": 123, "456Another123": 456}
{"NestedRecord": {"FirstName": "John", "LastName": "Doe"}, "456Another123": 456}
->{"FirstName": "John", "LastName": "Doe", "456Another123": 456}
{"ListExample": [{"A": "a"}, {"A": "b"}]}
->{"ListExample.0.A": "a", "ListExample.1.A": "b"}
{"MixedCase123": {"Nested": [{"Key": {"Value": "test1"}}, {"Key": {"Value": "test2"}}]}, "SimpleKey": "SimpleValue"}
->{"Nested.0.Key.Value": "test1", "Nested.1.Key.Value": "test2", "SimpleKey": "SimpleValue"}
{"List": ["Item1", "Item2", "Item3"]}
->{"List.0": "Item1", "List.1": "Item2", "List.2": "Item3"}
Summary by CodeRabbit
New Features
FlattenFields
, allowing users to flatten nested records into a single-level format.FlattenFields
components dynamically.Bug Fixes
Tests
FlattenFields
transformation, covering multiple scenarios to ensure accuracy.